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Scientific  Progress 

We  have  used  this  DURIP  award  to  facilitate  our  3D  Data  Acquisition  Platform.  We  have  acquired  major  instruments  such  as 
Vicon  MX-T40S  camera  system,  192  cores  of  HPC  computational  equipment,  and  Trigno  Wireless  EMG  system  with  16  Trigno 
EMG+XYZ  sensors  from  Delsys.  Current  platform  is  sufficient  to  incorporate  both  motion  capture  devices  and  3D  vision  sensors 
to  cross  validate  multimodality  data  acquisition,  and  address  fundamental  research  problems  of  representation  and  invariant 
description  of  3D  data,  human  motion  modeling  and  applications  of  human  activity  analysis,  and  computational  optimization  of 
large-scale  3D  data.  The  support  for  the  acquisition  of  such  equipment  has  significantly  facilitated  our  current  research  and 
educate  scientists  and  engineers  in  areas  important  to  national  defense.  We  are  using  this  platform  to  collecting  a  unique 
database  which  could  be  used  for  multimodality  sensor  fusion  for  human  motion  analysis,  action  recognition,  and  behavior 
understanding.  The  impact  of  this  award  will  last  long  as  the  new  facility  is  transforming  our  current  research  scope  and  in  the 
meanwhile  help  our  current  and  future  technology  transfer. 
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Project  Summary  Sheets  (PSS) 

Title:  3D  Data  Acquisition  Platform  for  Human  Activity  Understanding 
Grant  No:  DURIP  under  W91  INF- 14- 1-05 16 
PI:  Yun  Raymond  Fu,  Associate  Professor,  Northeastern  University,  Boston 


1,  Objective 

Reliable  online  recognition  and  prediction  of  human  actions  and  activities  in  temporal  sequences  has 
many  potential  applications  in  a  wide  range  of  Army-relevant  fields,  ranging  from  video  surveillance, 
warfighter  assistance,  human  computer  interface,  intelligent  humanoid  robots,  unmanned  and 
autonomous  vehicles,  to  diagnosis,  assessment  and  treatment  of  musculoskeletal  disorders,  etc.  a 
computational  approach  for  action  prediction  can  extend  our  findings  to  machines  and  also  promote 
further  research  in  human  prediction  and  intention  sensing.  Apparently,  a  practical  prediction  system 
must  output  a  rapid  response  for  partial  observations.  This  brings  up  a  new  challenge  to  the 
computational  models  and  motivates  machine  learning  researchers  to  make  more  progresses. 
Moreover,  action  prediction  will  need  to  model  temporal  structures  and  may  raise  an  important 
advance  for  action  recognition.  The  underlying  basic  goal  of  the  proposal  is  to  enhance  the  DoD’s 
capabilities  of  visual  intelligence  for  leveraging  automatic  human  activity  understanding  using  3D 
data  acquisition  platforms. 

2,  Scientific  Barriers 

Using  2D  visual  information  captured  by  single  or  multiple  cameras  for  human  activity  recognition 
has  been  extensively  studied  and  applied  to  real-world  systems  in  the  past  decade.  Flowever,  a 
remaining  open  problem  is  how  to  generalize  existing  models  and  frameworks  to  robust  and 
viewpoint  independent  recognition  and  even  prediction  of  diverse  human  actions  and  activities  in  a 
real  environment.  Recent  advances  in  3D  motion  capture  technology,  3D  depth  cameras  using 
structured  light  or  time-of-flight  sensors,  and  3D  information  recovery  from  2D  images/videos  have 
provided  commercially  viable  approaches  and  hardware  platforms  to  capture  3D  data  in  real-time,  and 
have  been  nurturing  a  potential  breakthrough  solution  to  such  problem  by  using  3D  data. 

A  computational  approach  for  action  prediction  can  extend  their  findings  to  machines  and  also 
promote  further  research  in  human  prediction  and  intention  sensing.  Apparently,  a  practical  prediction 
system  must  output  a  rapid  response  for  partial  observations.  This  brings  up  a  new  challenge  to  the 
computational  models  and  motivates  machine  learning  researchers  to  make  more  progresses. 
Moreover,  action  prediction  will  need  to  model  temporal  structures  and  may  raise  an  important 
advance  for  action  recognition. 

3,  Approach 

Our  3D  human  data  acquisition  platform  consists  of  a  set  of  3D  motion  capture  sensors  (e.g.  Vicon) 
and  a  set  of  3D  cameras  (e.g.  Kinect)  that  are  synchronized  and  integrated  to  cross  validate  data 
acquisition,  as  shown  in  Figure  1.  As  illustrated  in  the  computing  (right)  module,  new  methodologies 
of  3D  motion  reconstruction  and  3D  visual  modeling  will  be  developed  to  fill  in  the  gap  between 
vision  and  motion  data  and  form  the  computational  component  to  drive  interactions.  The  gap  between 
the  middle  level  and  low  level  data  flow  is  filled  by  parametric  and  composable  low-dimensional 
manifold  representations.  Such  integrated  data  acquisition  and  methodologies  will  link  the  visual 
representations  to  quantitative  biomechanical  assessment  of  the  human  movements  in  the  form  of 


immersive  activities,  which  aid  the  development  of  human  models  and  assist  in  the  progressive 
parametric  refinement  of  modeling. 


3D  Motion  Capture 


3D  Motion  Reconstruction 

Post  Estimation  and  Activity  Understanding 


Quantitative  Biomechanical  Assessment 


Interaction 


Fusion 


Interaction 


3D  Camera  Sensing 


3D  Motion  Tracking 


Fig.  2.  3D  human  data  acquisition  platform. 
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4.  Scientific  Significance 

Visual  intelligence  is  now  ubiquitous;  yet,  understanding  of  high-level  semantic  dynamics  is 
still  not  comprehensively  addressed  as  what  we  do  here.  We  propose  an  end-to-end  solution 
to  the  video  analysis  problem  that  integrates  “best-in-class”  ideas  and  innovates  where  the 
state-of-the-art  is  lacking.  The  success  of  this  research  has  revealed  new  understanding  about 
interactions  and  intentions  of  human  centered  environment,  fill  in  the  semantic  gap  between 
visual  signals  and  contextual  reasoning.  The  proposed  research  transforms  the  field  of  visual 
understanding  by  enabling  the  information  rich  video  media  to  be  intelligently  utilized. 
Research  on  imminent  activity  prediction  with  unconventional  approaches  may  undertake 
directions  that  challenge  assumptions  and  have  the  potential  to  radically  change  established 
practice.  Such  progresses  will  significantly  advance  the  visual  intelligence  field  and 
contribute  to  the  accomplishment  of  DoD’s  mission.  The  project  also  impacts  engineering  and 
science  education  at  various  levels  through  collaborative  research  project  involvement  by 
students  from  various  backgrounds,  especially  K-12  and  undergraduate  students.  The 
inspirational  aspects  has  attracted  young  scholars  to  careers  in  science  and  engineering,  while 
promoting  scientific  values  and  progress  to  the  broader  community. 


5.  Scientific  Accomplishments 

We  have  used  this  DURIP  award  to  facilitate  our  3D  Data  Acquisition  Platform.  We  have  acquired 
major  instruments  such  as  Vicon  MX-T40S  camera  system,  192  cores  of  HPC  computational  equipment, 
and  Trigno  Wireless  EMG  system  with  16  Trigno  EMG+XYZ  sensors  from  Delsys.  Current  platform  is 
sufficient  to  incorporate  both  motion  capture  devices  and  3D  vision  sensors  to  cross  validate 
multimodality  data  acquisition,  and  address  fundamental  research  problems  of  representation  and 
invariant  description  of  3D  data,  human  motion  modeling  and  applications  of  human  activity  analysis, 
and  computational  optimization  of  large-scale  3D  data.  The  support  for  the  acquisition  of  such 
equipment  has  significantly  facilitated  our  current  research  and  educate  scientists  and  engineers  in  areas 
important  to  national  defense.  We  are  using  this  platform  to  collecting  a  unique  database  which  could  be 
used  for  multimodality  sensor  fusion  for  human  motion  analysis,  action  recognition,  and  behavior 
understanding.  The  impact  of  this  award  will  last  long  as  the  new  facility  is  transforming  our  current 
research  scope  and  in  the  meanwhile  help  our  current  and  future  technology  transfer. 


In  summary,  in  the  past  period  of  research  under  this  award,  the  Pi’s  group  has  published  a  book  solely 
edited  by  the  PI,  and  research  outcomes  have  started  been  citing/using  by  other  researchers 
internationally  for  transitions. 

•  [Book]Yun  Fu,  Human  Activity  Recognition  and  Prediction,  Springer,  2016.  doi:  10. 1007/978- 
3-319-27004-3 

In  particular  the  PI  and  his  team  have  received  many  international  recognition,  awards  and  honors  listed 
as  follows: 

•  Dr.  Fu  was  elected  as  a  member  of  Global  Young  Academy. 

•  Dr.  Fu  was  recognized  by  the  IEEE  Computational  Intelligence  Society  (CIS)  as  the 
awardee  of  2016  IEEE  CIS  Outstanding  Early  Career  Award,  for  contributions  to  neural 
computing,  manifold  learning,  and  visual  intelligence. 

•  Dr.  Fu  was  selected  as  one  of  the  2015  National  Academy  of  Engineering  US  Frontiers 
of  Engineering  by  NAE. 

•  Dr.  Fu  was  elected  to  be  Senior  Member  of  ACM. 

•  Dr.  Fu  received  2016  Adobe  Faculty  Research  Awards. 

•  Dr.  Fu  was  promoted  to  the  rank  of  Associate  Professor  with  Tenure. 

•  Fonner  Ph.D.  student  Li,  Kang’s  dissertation  entitled  “Video  event  recognition  and 
prediction  based  on  temporal  structure  analysis”  has  been  featured  by  IEEE  Signal 
Processing  society  eNews  in  the  year  2015  at  http://www. 
signalprocessingsociety.org/newsletter/category/ph-d-theses/page/ll/. 

•  Students  Shuyang  Wang,  Shuhui  Jiang,  Ming  Shao,  Zhengming  Ding,  Handong  Zhao 
and  Hongfu  Liu  received  the  AAAI  Student  Travel  Award  for  AAAI  2016 

•  Student  Sheng  Li,  receives  the  2015  Chinese  Government  Award  for  Outstanding  Self- 
Financed  Students  Abroad. 

•  Student  Sheng  Li  received  the  2015  NEU  Outstanding  Graduate  Student  Award 
(Topmost  student  award  in  NEU) 

•  Student  Sheng  Li  received  the  ACM  SIGIR  Travel  Award  for  CIKM  15. 

•  Students  Handong  Zhao  and  Hongfu  Liu  received  the  ICDM  Student  Travel  Award  for 
ICDM2015 


6.  Collaborations  and  Leveraged  Funding 

Leveraged  by  this  grant,  the  PI  has  submitted  an  ARO  regular  proposal  with  a  title  of  “Deep  Multi- 
Factor  Learning  for  Intelligent  Human  Identification”,  which  is  under  review  and  consideration.  The  PI 
is  collaborating  with  MIT  Lincoln  lab  for  TrecVid  evaluation  and  achieved  top  ranking.  By 
collaborating  with  the  Natick  Solider  Center  (NSC)  and  applying  new  techniques  for  shape  analysis  and 
classification  to  these  3D  data  will  help  designers  of  clothing  and  personal  protection  equipment  to 
understand  and  fit  Army  population. 

7.  Technology  Transfer 

N/A 

8.  Anticipated  Scientific  Accomplishments 

We  have  acquired  major  instruments  such  as  Vicon  MX-T40S  camera  system,  192  cores  of  HPC 
computational  equipment,  and  Trigno  Wireless  EMG  system  with  16  Trigno  EMG+XYZ  sensors  from 
Delsys.  Create  the  platform  is  to  incoiporate  both  motion  capture  devices  and  3D  vision  sensors  to  cross 
validate  multimodality  data  acquisition,  and  address  fundamental  research  problems  of  representation 
and  invariant  description  of  3D  data,  human  motion  modeling  and  applications  of  human  activity 
analysis,  and  computational  optimization  of  large-scale  3D  data. 

9.  Future  Research  Plans 

We  propose  a  series  novel  deep  learning  methods  for  multi-factor  human  action  recognition  to  address 
real  world  negative  factors.  The  technical  merits  of  deep  learning  is  it  can  well  utilize  large-scale 
training  data  from  thousands  hours  of  surveillance  videos,  and  millions  of  mug  shot  photos,  and  can 
adapt  to  multi-view  human  action  recognition  in  low-quality  surveillance  environments  with  flexible 
model  structure.  This  essentially  mimics  the  cognitive  process  of  human  being,  which  processes  the 
visual  information  layer  by  layer.  The  DURIP  project  facilitates  our  3D  human  modeling  which  could 
significant  enhance  system  performance  which  therefore  compensate  for  different  poses,  modality,  low 
image  quality,  occlusions,  and  noisy  labels. 

10.  Conclusions 

The  PI  has  achieved  significant  research  progress  and  created  a  strong  collaboration  with  colleagues  at 
MIT  LL  and  ARL.  The  accomplishments  include  book  publication  and  several  major  awards  and 
honors.  The  funding  support  leverages  the  future  funding  endeavor  by  the  PI. 


